word hypothesis
A Simple and Effective Unsupervised Word Segmentation Approach
Chen, Songjian (Sun Yat-sen University) | Xu, Yabo (Sun Yat-sen University) | Chang, Huiyou (Sun Yat-sen Universit)
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our approach is a novel word induction criterion called WordRank, which estimates the goodness of word hypotheses (character or phoneme sequences). We devise a method to derive exterior word boundary information from the link structures of adjacent word hypotheses and incorporate interior word boundary information to complete the model. In light of WordRank, word segmentation can be modeled as an optimization problem. A Viterbi-styled algorithm is developed for the search of the optimal segmentation. Extensive experiments conducted on phonetic transcripts as well as standard Chinese and Japanese data sets demonstrate the effectiveness of our approach. On the standard Brent version of Bernstein-Ratner corpora, our approach outperforms the state-of-the-art Bayesian models by more than 3%. Plus, our approach is simpler and more efficient than the Bayesian methods. Consequently, our approach is more suitable for real-world applications.
Grammatical Error Detection for Corrective Feedback Provision in Oral Conversations
Lee, Sungjin (Pohang University of Science and Technology (POSTECH)) | Noh, Hyungjong (Pohang University of Science and Technology (POSTECH)) | Lee, Kyusong (Pohang University of Science and Technology (POSTECH)) | Lee, Gary Geunbae (Pohang University of Science and Technology (POSTECH))
The demand for computer-assisted language learning systems that can provide corrective feedback on language learners’ speaking has increased. However, it is not a trivial task to detect grammatical errors in oral conversations because of the unavoidable errors of automatic speech recognition systems. To provide corrective feedback, a novel method to detect grammatical errors in speaking performance is proposed. The proposed method consists of two sub-models: the grammaticality-checking model and the error-type classification model. We automatically generate grammatical errors that learners are likely to commit and construct error patterns based on the articulated errors. When a particular speech pattern is recognized, the grammaticality-checking model performs a binary classification based on the similarity between the error patterns and the recognition result using the confidence score. The error-type classification model chooses the error type based on the most similar error pattern and the error frequency extracted from a learner corpus. The grammaticality checking method largely outperformed the two comparative models by 56.36% and 42.61% in F-score while keeping the false positive rate very low. The error-type classification model exhibited very high performance with a 99.6% accuracy rate. Because high precision and a low false positive rate are important criteria for the language-tutoring setting, the proposed method will be helpful for intelligent computer-assisted language learning systems.
The HEARSAY-II speech understanding system: Integrating knowledge to resolve uncertainty
The Hearsay-H speech-understanding system (SUS) developed at Carnegie-Mellon University recognizes connected speech in a 1000-word vocabulary with correct interpretations for 90 percent of test sentences. Its basic methodology involves the application of symbolic reasoning as an aid to signal processing. A marriage of general artificial intelligence techniques with specific acoustic and linguistic knowledge was needed to accomplish satisfactory speech-This research was supported chiefly by Defense Advanced Research Projects Agency contract F44620-73- C-0074 to Carnegie-Mellon University. In addition, support for the preparation of this paper was provided by USC/ISI, Rand, and the University of Massachusetts. We gratefully acknowledge their support. Views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official opinion or policy of DARPA, the U.S. government, or any other person or agency connected with them.
Prototypes and production rules: An approach to knowledge representation for hypothesis formation
Frederick Hayes-Roth The RAND Corporation Using the concepts of stimulus and response frames of scheduled Knowledge source instantiations, competition among alternative responses, goals, and the desirability of a knowledge source instantiation, a general attentional control mechanism is developed. This general focusing mechanism facilitates the experimental evaluation of a variety of specific attentional control policies (such as best-first, bottom-up, and top-down search strategies) and allows the modular addition of specialized heuristics for the speech understanding task. Empirical results demonstrate the effectiveness of the focusing principles, and possible directions for future research are considered. INTRODUCTION The Hearsay-II (HSII) speech understanding system (Lesser, et al., 1974; Erman & Lesser, 1975; Lesser & Frman, 1977) is a complex, distributed-logic processing system. Inputs to the system are temporal sequences of sets of acoustic segments and associated hypothesized phonetic labels.